AITopics | representative sample

Collaborating Authors

representative sample

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Online Convex Matrix Factorization with Representative Regions

Jianhao Peng, Olgica Milenkovic, Abhishek Agarwal

Neural Information Processing SystemsFeb-15-2026, 07:40:27 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, dataset, matrix factorization, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Champaign County > Urbana (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

2b1d1e5affe5fdb70372cd90dd8afd49-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 11:04:52 GMT

diffusion model, pathologist feedback, synthetic image, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Virginia (0.04)
Europe > Ireland > Munster (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.71)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

Mass Distribution versus Density Distribution in the Context of Clustering

Ting, Kai Ming, Zhu, Ye, Zhang, Hang, Liang, Tianrun

arXiv.org Machine LearningJan-26-2026

This paper investigates two fundamental descriptors of data, i.e., density distribution versus mass distribution, in the context of clustering. Density distribution has been the de facto descriptor of data distribution since the introduction of statistics. We show that density distribution has its fundamental limitation -- high-density bias, irrespective of the algorithms used to perform clustering. Existing density-based clustering algorithms have employed different algorithmic means to counter the effect of the high-density bias with some success, but the fundamental limitation of using density distribution remains an obstacle to discovering clusters of arbitrary shapes, sizes and densities. Using the mass distribution as a better foundation, we propose a new algorithm which maximizes the total mass of all clusters, called mass-maximization clustering (MMC). The algorithm can be easily changed to maximize the total density of all clusters in order to examine the fundamental limitation of using density distribution versus mass distribution. The key advantage of the MMC over the density-maximization clustering is that the maximization is conducted without a bias towards dense clusters.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2601.10759

Country:

North America > United States (0.46)
Asia (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback

Neural Information Processing SystemsOct-8-2025, 08:52:37 GMT

For each image, the pathologist provides binary feedback, labeling a synthetic image as "1" if it fails to meet all of the plausibility criteria.

diffusion model, pathologist feedback, synthetic image, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Virginia (0.04)
Europe > Ireland > Munster (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

Online Clustering of Seafloor Imagery for Interpretation during Long-Term AUV Operations

Liang, Cailei, Bodenmann, Adrian, Fenton, Sam, Thornton, Blair

arXiv.org Artificial IntelligenceSep-9-2025

Abstract--As long-endurance and seafloor-resident AUVs become more capable, there is an increasing need for extended, real-time interpretation of seafloor imagery to enable adaptive missions and optimise communication efficiency. Although offline image analysis methods are well established, they rely on access to complete datasets and human-labelled examples to manage the strong influence of environmental and operational conditions on seafloor image appearance--requirements that cannot be met in real-time settings. T o address this, we introduce an online clustering framework (OCF) capable of interpreting seafloor imagery without supervision, that is designed to operate in real-time on continuous data streams in a scalable, adaptive, and self-consistent manner . The method enables the efficient review and consolidation of common patterns across the entire data history in constant time by identifying and maintaining a set of representative samples that capture the evolving feature distribution, supporting dynamic cluster merging and splitting without reprocessing the full image history. We evaluate the framework on three diverse seafloor image datasets, analysing the impact of different representative sampling strategies on both clustering accuracy and computational cost. The OCF achieves the highest average F1 score of 0.68 across the three datasets among all comparative online clustering approaches, with a standard deviation of 3% across three distinct survey trajectories, demonstrating its superior clustering capability and robustness to trajectory variation. In addition, it maintains consistently lower and bounded computational time as the data volume increases. Compared to offline clustering methods, it strikes a favourable balance between accuracy and efficiency. These properties are beneficial for generating survey data summaries and supporting informative path planning in long-term, persistent autonomous marine exploration.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.06678

Country: Asia > Japan (0.28)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(2 more...)

Add feedback

Online Convex Matrix Factorization with Representative Regions

Jianhao Peng, Olgica Milenkovic, Abhishek Agarwal

Neural Information Processing SystemsAug-20-2025, 10:42:44 GMT

Matrix factorization (MF) is a versatile learning method that has found wide applications in various data-driven disciplines.

algorithm, dataset, matrix factorization, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Champaign County > Urbana (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Uncertainty-Aware Learning for Zero-Shot Semantic Segmentation Ping Hu1 Stan Sclaroff 1 Kate Saenko 1, 2 1 Boston University 2

Neural Information Processing SystemsAug-17-2025, 08:13:33 GMT

Though achieving promising results, these methods may still suffer from several limitations.

machine learning, natural language, segmentation, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Predictive Representativity: Uncovering Racial Bias in AI-based Skin Cancer Detection

Morales-Forero, Andrés, Rueda, Lili J., Herrera, Ronald, Bassetto, Samuel, Coatanea, Eric

arXiv.org Machine LearningJul-22-2025

Artificial intelligence (AI) systems increasingly inform medical decision-making, yet concerns about algorithmic bias and inequitable outcomes persist, particularly for historically marginalized populations. This paper introduces the concept of Predictive Representativity (PR), a framework of fairness auditing that shifts the focus from the composition of the data set to outcomes-level equity. Through a case study in dermatology, we evaluated AI-based skin cancer classifiers trained on the widely used HAM10000 dataset and on an independent clinical dataset (BOSQUE Test set) from Colombia. Our analysis reveals substantial performance disparities by skin phototype, with classifiers consistently underperforming for individuals with darker skin, despite proportional sampling in the source data. We argue that representativity must be understood not as a static feature of datasets but as a dynamic, context-sensitive property of model predictions. PR operationalizes this shift by quantifying how reliably models generalize fairness across subpopulations and deployment contexts. We further propose an External Transportability Criterion that formalizes the thresholds for fairness generalization. Our findings highlight the ethical imperative for post-hoc fairness auditing, transparency in dataset documentation, and inclusive model validation pipelines. This work offers a scalable tool for diagnosing structural inequities in AI systems, contributing to discussions on equity, interpretability, and data justice and fostering a critical re-evaluation of fairness in data-driven healthcare.

data mining, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2507.14176

Country:

South America > Colombia (0.24)
North America > Canada > Quebec > Montreal (0.04)
Oceania > Australia (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Dermatology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Skin Cancer (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Data Science > Data Mining (0.93)
(2 more...)

Add feedback

Robust Emotion Recognition via Bi-Level Self-Supervised Continual Learning

Ahmad, Adnan, Nakisa, Bahareh, Rastgoo, Mohammad Naim

arXiv.org Artificial IntelligenceMay-20-2025

Emotion recognition through physiological signals such as electroencephalogram (EEG) has become an essential aspect of affective computing and provides an objective way to capture human emotions. However, physiological data characterized by cross-subject variability and noisy labels hinder the performance of emotion recognition models. Existing domain adaptation and continual learning methods struggle to address these issues, especially under realistic conditions where data is continuously streamed and unlabeled. To overcome these limitations, we propose a novel bi-level self-supervised continual learning framework, SSOCL, based on a dynamic memory buffer. This bi-level architecture iteratively refines the dynamic buffer and pseudo-label assignments to effectively retain representative samples, enabling generalization from continuous, unlabeled physiological data streams for emotion recognition. The assigned pseudo-labels are subsequently leveraged for accurate emotion prediction. Key components of the framework, including a fast adaptation module and a cluster-mapping module, enable robust learning and effective handling of evolving data streams. Experimental validation on two mainstream EEG tasks demonstrates the framework's ability to adapt to continuous data streams while maintaining strong generalization across subjects, outperforming existing approaches.

artificial intelligence, machine learning, semanticscholar, (16 more...)

arXiv.org Artificial Intelligence

2505.10575

Genre: Research Report (1.00)

Industry:

Transportation > Air (0.93)
Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Cost-Effective LLM-based Approach to Identify Wildlife Trafficking in Online Marketplaces

Barbosa, Juliana, Gondhali, Ulhas, Petrossian, Gohar, Sharma, Kinshuk, Chakraborty, Sunandan, Jacquet, Jennifer, Freire, Juliana

arXiv.org Artificial IntelligenceMay-1-2025

Wildlife trafficking remains a critical global issue, significantly impacting biodiversity, ecological stability, and public health. Despite efforts to combat this illicit trade, the rise of e-commerce platforms has made it easier to sell wildlife products, putting new pressure on wild populations of endangered and threatened species. The use of these platforms also opens a new opportunity: as criminals sell wildlife products online, they leave digital traces of their activity that can provide insights into trafficking activities as well as how they can be disrupted. The challenge lies in finding these traces. Online marketplaces publish ads for a plethora of products, and identifying ads for wildlife-related products is like finding a needle in a haystack. Learning classifiers can automate ad identification, but creating them requires costly, time-consuming data labeling that hinders support for diverse ads and research questions. This paper addresses a critical challenge in the data science pipeline for wildlife trafficking analytics: generating quality labeled data for classifiers that select relevant data. While large language models (LLMs) can directly label advertisements, doing so at scale is prohibitively expensive. We propose a cost-effective strategy that leverages LLMs to generate pseudo labels for a small sample of the data and uses these labels to create specialized classification models. Our novel method automatically gathers diverse and representative samples to be labeled while minimizing the labeling costs. Our experimental evaluation shows that our classifiers achieve up to 95% F1 score, outperforming LLMs at a lower cost. We present real use cases that demonstrate the effectiveness of our approach in enabling analyses of different aspects of wildlife trafficking.

classifier, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3725256

2504.21211

Country: